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Laboratory  Replication 
of  Scientific  Discovery  Processes 


Yulin  Qln  and  Herbert  A.  Simon 
Carnegie-Meflon  University 

Abstract 


v, 

"•^Fourteen  subjects  were  tape  recorded  while  they  undertook  to  find  a  law 
to  summarize  numerical  data  they  were  given.  The  source  of  the  data  was 
not  identified,  n or  were  the  variables  labeled  semantically.  Unknown  to  the 
subjects,  the  data  were  measurements  of  the  distances  of  the  planets  from 
the  Sun  and  the  periods  of  their  revolutions  about  it  --  equivalent  to  the 
data  used  by  Johannes  Kepler  to  discover  his  Third  Law  of  planetary 
motion, 

•^Four  of  the  14  subjects  discovered  the  same  law  as  Kepler  did  (the 
period  varies  as  the  3/2  power  of  the  distance),  and  a  fifth  came  very  close 
to  the  answer.  The  subjects'  protocols  provide  a  detailed  picture  of  the 
problem-solving  search  they  engaged  in,  mainly,  but  not  exclusively.  In  the 
space  of  possible  functions  for  fitting  the  data,  and  provide  explanations  as 
tg  why  some  succeeded  and  the  others  failed, 

i&The  search  heuristic?  used  by  the- subjects  are  similar  to  those  embodied 
in  the  BACON  program,  a  computer  simulation  of  certain  scientific  discovery 
processes^  The  experiment  demonstrates  the  feasibility  of  examining  some 
of  the  processes  of  scientific  discovery  by  recreating  in  the  laboratory 
discovery  situations  of  substantial  historical  relevance.  It  demonstrates  also, 
that  under  conditions  rather  similar  to  those  of  the  original  discovery,  a  law 
can  be  rediscovered  by  persons  of  ordinary  intelligence  (l.e  the  Intelligence 
needed  for  academic  success  In  a  good  university).  The  data  for  the 
successful  subjects  reveal  no  "creative"  processes  in  this  kind  of  a 
discovery  situation  different  from  those  that  are  regularly  observed  In  all 
kinds  of  problem  solving  settings. 
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in  1618,  Johannes  Kepler  discovered  his  Third  Law  of  planetary  motion  The 
cube  of  a  planet's  distance  from  the  Sun  is  proportional  to  the  square  of  its  period 
of  revolution,  or 
D 3/P2  -  C. 

where  D  is  the  distance.  P  the  period,  and  C  a  constant.  This  discovery, 
along  with  Kepler  s  laws  of  elliptical  orbits  and  equal  areas,  paved  the  way  for 

Newton's  discovery  of  the  law  of  universal  gravitation,  from  which  Kepler's  laws  can 
be  deduced  logically 

The  discovery  of  the  Third  Law  provides  a  setting  for  the  study  of  some  of  the 
processes  that  people  (scientists)  use  to  find  regularities  in  data,  especially  In  the 
frequent  circumstances  where  there  exist  no  bodies  of  relevant  theory  to  guide  the 
search  In  this  instance,  as  in  many  others  in  the  history  of  natural  science,  the 
discovery  requires  an  induction  directly  from  the  data  without  help  from  pre-existing 
theory  Data-driven  discovery  of  this  kind  has  been  simulated  by  the  BACON 

program  (Langley,  et  al..  1987).  which,  using  a  few  simple  heuristics,  rediscovered 
Keplers  Third  Law.  as  well  as  Ohms  Law  of  electrical  currents.  Black’s  Law  of 
temperature  equilibrium,  and  a  substantial  number  of  other  important  laws  of  18th  and 
I9th  century  chemistry  and  physics.  Langley,  et  al.  (1987)  also  discuss  tne 

significance  of  data-driven  discovery  in  the  overall  progress  of  science 

The  purpose  of  the  experiments  described  in  this  paper  was  to  compare  human 
data-driven  discovery  processes  with  the  processes  embodied  in  BACON,  and  thereby 
to  determine  their  similarities  and  differences.  Do  humans  use  the  same  •; 

as  BACON  when  they  are  confronted  with  the  Kepler  data?  Unfortunately  it  is  too 
late  to  take  a  protocol  from  Kepler,  and  he  left  behind  only  a  minimal  record  of  how 
he  found  the  Third  Law.  As  possibly  inadequate  substitutes  for  Kepler,  we  recruited 

college  students  for  two  closely  similar  experiments.  The  data  we  obtained  from  these 
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experiments  gives  us  evidence  of  how  human  subjects  respond  to  their  task  and  how 
their  methods  compare  with  BACON'S,  and  perhaps  cast  some  light  on  the  history  of 
scientific  discovery  in  the  case  of  Kepler  s  Third  Law.  We  will  first  describe  the 
experiments  and  their  results,  next  comment  upon  their  significance  for  the 
psychology  of  discovery  viewed  as  problem  solving,  then  ask  what  light  they  may  cast 
on  Kepler  s  discovery. 

Experiment  1 

Method  and  Material 

The  data  used  in  this  experiment  were  the  average  distances  from  the  Sun, 
and  the  periods  of  revolution  about  the  Sun,  of  Mercury,  Venus.  Earth,  Mars,  and 
Jupiter,  obtained  from  the  1986  World  Almanac.  Kepler,  in  the  17th  Century,  used 
only  slightly  less  precise  data  (See  Harmonies  of  the  World.  1619/1952.  chs,  3,4.) 
The  data  given  to  the  subjects  (Table  1)  were  not  identified  by  source,  and  the 
variables  were  labeled  "s"  and  "q"  (instead  of  "Distance”  and  "Period")  so  as  not 
to  reveal  their  meaning 

Insert  Table  1  about  here 


The  experiment  generally  lasted  about  1  hour  unless  S  solved  the  problem  in  a 
shorter  time.  Subjects  were  allowed  to  use  pen.  scratch  paper,  and  a  Calculator  that 
had  multiplication  and  division  operators  as  well  as  exponential  and  logarithmic 
functions  (The  experimenter  brought  a  calculator  into  the  experiment  room.  However 
we  allowed  the  subject  use  his/her  own  calculator  if  he/she  preferred  to.)  The 
subjects  were  instructed  as  follows: 

Ve  are  interested  in  hov  a  human  being  discovers  a  scientific 
lav.  This  experiment  is  not  designed  to  test  your  problem 
solving  ability.  It  is  simply  to  discover  vhat  methods  you 
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would  use  to  build  a  formula  describing  the  relationship 
between  two  groups  of  given  data. 

In  order  to  follow  your  thoughts  we  ask  that  you  think  aloud, 
explaining  each  step  as  thoroughly  as  you  can. 

The  data  will  be  presented  on  another  sheet  of  paper,  and  you 
should  begin  by  reading  the  data  aloud. 

After  finishing  the  experiment.  S  s  were  asked  if  they  could  identify  the  law  that 
fitted  tne  data  None  identified  it  as  Kepler  s  law.  nor  is  there  any  indication  from 
their  protocols  that  they  were  aware  of  the  meaning  of  the  data  or  the  law  that 
described  them.  So.  while  they  may  have  previously  encountered  Kepler's  law  In  their 
physics  courses,  there  is  no  reason  to  think  that  memory  assisted  them  In  solving  the 
problem 

Subjects 

Nine  subjects  took  part  in  experiment  1.  Their  academic  status  is  shown  In 
Table  2.  Five  were  undergraduates,  all  of  whom  had  taken  or  were  taking  courses 
in  physics,  calculus,  and  chemistry;  one  was  a  graduate  student  in  physics,  one  an 
engineer,  one  a  graduate  student  in  art  history,  and  one  a  graduate  student  in 
education 

Insert  Table  2  about  here 


Problem  Analysis 

The  structure  of  the  problem  can  be  illuminated  bv  observing  hnw  PAr-r^N 
attacked  it.  The  BACON  program  found  Kepler's  third  law  in  about  two  minutes  on  a 
medium-size  computer,  it  did  not  search  "the  space  of  all  possible  functions."  but 
used  the  few  simple  heuristics  shown  in  Table  3  to  guide  its  search.  Following 
these  heuristics,  it  first  constructed  (Heuristic  4)  and  tested  the  function.  P/D  -  C. 
without  success  This  led  it  (again  by  Heuristic  4)  to  construct  and  test  P/D2  -  C. 
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also  without  success.  Then,  using  Heuristic  5,  it  constructed  P2/D3  •  C.  and 
concluded  that  this  function  fit  the  data  satisfactorily  ' 

Insert  Table  3  about  here 


The  basic  idea  underlying  BACON'S  success  is  to  notice  when  two  variables 
are  increasing  and  decreasing  together,  and  then  to  test  their  ratio.  (if  one 

increases  while  the  other  decreases,  test  their  product  instead  )  Repeated  application 
of  this  principle  to  the  original  data  and  to  the  new  functions  derived  from  them 

quickly  produces  the  desired  function.  It  is  interesting  to  note  that  if  the  test  of 

(approximate)  equality  in  BACON  is  loosened.  It  will  be  satisfied  with  the  second 
function  it  finds.  P/D2  =  C.  and  will  stop  there  So  did  Kepler,  who  was  satisfied 

with  the  inverse  square  law  for  about  ten  years,  until  he  took  up  the  problem  anew 

to  see  if  he  could  get  a  better  fit  to  the  data! 

Evoking  heuristics  like  those  in  BACON,  and  proceeding  along  the  lines 
sketched  above,  is  only  one  way  to  solve  the  problem.  One  of  the  other  ways  is  to 
take  logarithms  of  the  quantities  $  and  g,  whereupon  the  law  becomes: 

log  q  =  3/2  log  s  K.  where  K  is  (log  C)/2. 

BACON  s  third  linear  heuristic  (Table  3)  would  find  the  law  immediately  from 

these  log-transformed  data.  Yet  another  way  Is  to  try  the  square  root  of  s3. 

s3/2/q  -  K1,  where  K1  is  C1/2. 

Behavior  of  Subjects 

Finding  the  law  is  not  easy  for  human  subjects.  A  freshman  (S3)  and  the 

physics  graduate  student  (SY)  found  the  law,  and  a  junior  electrical  engineering 


1  See  Langley,  at  ai..  o.  85.  various  versions  of  BACON  will  try  slightly  different  search  paths,  but 
none  will  need  more  than  a  half  dozen  tries  to  find  Kepler's  law. 
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student  (S4)  came  very  close;  the  others  failed  to  find  it  (See  Table  2). 

Data-driven  scientific  discovery  is  a  kind  of  ill-structured  problem  solving 
Although  subjects  can,  in  principle,  use  means-end  analysis  etc  to  solve  the  problem, 
in  general,  they  don't  know  where  the  goal  is.  and  don't  know  the  distance  between 
their  current  solution  attempt  and  the  goal  Sometimes,  the  current  solution  attempt 
is  very  near  the  goal,  but  they  miss  it  For  example,  in  SWs  protocol,  we  find: 

Hov  about  s^/,L? 

It's  too  complex. 

She  didn't  check  her  hypothesis,  but  turned  to  (qt  ^  1-qi)/(s,  ^  ,-s,)  instead. 

in  this  experiment,  subjects  encountered  at  least  three  specific  difficulties 

1  The  relation  between  q  and  s  is  nonlinear.  Three  subjects.  S5.  SG  and 
SJ  failed  because  they  only  tried  linear  relations.  (Note  that  the  latter 
two  subjects  were  the  least  sophisticated,  mathematically,  of  the  nine.) 

Other  subjects,  for  example.  SW.  spent  a  great  deal  of  their  time  In 
unsuccessful  efforts  to  find  a  linear  relation. 

2.  If  we  write  the  law  in  the  form,  q  *  f(s).  we  get  a  non-integral  power  of 
s.  3/2.  SW  failed  to  solve  the  problem  through  not  testing  non-integral 
powers,  and  SI  found  no  systematic  way  to  arrive  at  the  correct  power. 

3  The  constant  coefficient  in  the  law  is  not  unity.  This  was  at  the  root  of 
the  failures  of  Si.  S2.  and  S4.  who  neglected  to  include  the  coefficient 
m  the  functions  they  were  considering. 

Let  us  examine  the  search  strategies  that  the  subjects  employed  Their 

protocols  show  them  generating  a  sequence  of  functions  and  testing  these  functions 
against  the  data.  In  their  fitting  of  functions,  two  motives  were  in  evidence:  a 
function  might  be  fitted  because  it  was  hypothesized  to  be  the  correct  one.  or  it 
might  be  fitted  simply  to  gain  information  about  the  shape  and  trpnd  of  me  -ima  u 

is  not  always  easy  to  determine  from  the  protocols  which  motive,  or  combination  of 

them,  is  operative. 

in  many  cases,  a  subject  considered  a  particular  function,  dropped  it  for 

another,  and  at  some  later  time  returned  to  it.  Table  4  shows  the  principal  types  of 
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functions  that  the  six  subjects  who  went  beyond  linear  functions  generated  and 
examined,  and  the  number  of  times  each  function  was  considered  by  each  subject 
Some  extreme  cases  of  persistence  are  S4's  examination  of  linear  functions  seven 
times,  and  S2  s  examination  of  sequential  functions  seven  times,  but  the  other 
subjects  are  not  far  behind  All  of  the  six  subjects  tried  linear,  sequential  and 
quadratic  functions  at  least  once  four  tried  log  functions,  three  tried  cubic  functions 
and  two.  others  A  total  of  about  32  different  functions  in  these  categories  were 
considered  by  one  or  more  subjects. 

Insert  Table  4  about  here 


The  function  types  recorded  in  the  table  are  defined  as  follows: 

1.  Linear.  These  are  relations  like  q/s.  q  ■  s.  s/C.  or  q/C.  and  so  on. 

2  Sequential.  These  are  functions  that  relate  successive  values  of  q.  e  g.. 

q(  with  q^  ,  or  si  with  s,<r  Such  functions  may  arise  m  either  of  two 

ways  The  subject  may  be  considering  first  differences  of  the  variables 
(taking  differences  between  successive  values),  or  may  be  thinking  of 
possible  sequential  patterns  of  the  values  of  each  separate  variable.2 
When  subjects  considered  both  a  function  of  0  and  a  function  of  s 

simultaneously,  this  is  counted  as  one  occurrence  in  Tables  4  and  6: 
while  if  they  considered  only  a  function  of  q  or  a  function  of  s.  this  is 
counted  as  5 

3.  Quadratic.  These  include  functions  like  sf  s2/q,  $*  +  bs  +  c  =  q,  and 

so  on. 

4.  Logarithmic.  These  are  functions  like  (log  s)/(log  q).  log(s/q).  and  so  on. 

5.  Cubic.  These  are  functions  like  s3/q.  s3/q2. 

6.  Other.  Among  these,  we  find  functions  like  s*.  s’/2.  and  qu2. 

One  other  manipulation  of  the  data  that  should  be  mentioned  is  S3’s  rounding 


2Kepier  cam*  to  the  pro&iem  of  relating  period  to  distance  after  a  long  period  of  search  for  a 
pattern  of  the  successive  distances  of  the  planets,'  which  included  his  famous  proposal  for  relating 
those  distances  to  properties  of  the  regular  solids. 
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of  the  results  of  computations  to  simpler  numbers  like  5/2,  1 0/3.  4,  5,  9.  This 
abstraction  process  made  It  easier  for  S3  to  find  trends  in  the  data  and  ultimately  to 
discover  the  law. 

Most  of  the  subjects  made  use  of  some  kind  of  diagram:  scatter  diagrams  of 
the  data,  or  rough  graphs  of  the  functions  they  were  considering  Table  4  shows 
that  Si  used  graphs,  and  all  sub  except  Si  and  S3  used  scatter  diagrams.  (A 
"0  5“  in  Table  4  refers  to  an  unfinished  scatter  diagram  > 

in  Table  5  are  shown  the  percentages  of  the  function  references  belonging  to 
the  various  types,  for  each  subject,  and  in  total.  Table  6  provides  more  detail  on 
percentages  of  references  to  the  seven  functions  that  subjects  considered  most 
frequently,  and  that  account  collectively  for  more  than  half  of  all  the  references. 

Insert  Table  5  about  here 


Insert  Table  6  about  here 


From  these  data  we  can  draw  a  number  of  generalizations. 

Linear  functions  were  considered  most  frequently  From  Tables  4  and  5. 
we  can  see  that  about  28.6%  of  function  references  were  to  linear  functions 
(excluding  the  three  subjects  who  considered  only  linear  functions).  Sequential 
functions  are  next  most  often  considered,  then  quadratic  functions,  then  logarithmic 
functions. 

Simple  functions  were  considered  more  frequently  than  complex  ones 
Although  about  32  different  functions  were  considered  by  one  or  more  subjects,  we 
can  see  from  Table  6  that  the  seven  simple  functions  listed  there  account  for  52.4% 
of  ail  references.  Of  the  38.5  references  to  these  seven  functions,  18,  or  46  8%, 
are  to  the  two  linear  functions  (the  first  and  third  columns). 
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There  arc  lr  j*  Individual  difference*  in  tha  functlona  considered  From 
Table  4  or  Table  5,  we  can  see  that  Si  gave  almost  equal  consideration  to  functions 
in  each  of  five  categories  S2  preferred  logarithmic  and  sequential  functions  S3 
restricted  his  consideration  almost  wholly  to  linear  and  quadratic  functions  S4 
thought  mostly  about  linear  and  "other"  functions  SY  considered  five  types  of 
functions,  and  SW  four  types  The  three  subjects  excluded  from  the  table 
considered  only  linear  functions 

Diagrams  were  used  extensively  From  Table  4.  we  see  that  almost  every 

subject  used  diagrams,  usually  scatter  diagrams,  but  In  one  case,  graphs. 

These  data  give  some  picture  of  the  diversity  and  similarities  of  behavior  among 
the  subjects,  but  do  not  explain  immediately  why  son-*  subjects  were  successful,  and 
others  not.  A  closer  look  at  the  protocols  of  the  two  successful  subject's  will  give  us 
a  better  picture  of  what  they  did  Our  descriptions  of  behavior  are  based  on 
problem  behavior  graphs  (Newell  and  Simon.  1972)  constructed  ‘rom  the  protocols  of 
each  subject.  In  constructing  the  problem  behavior  graphs  we  distinguished  nodes 
at  which  subjects  mentioned  functions  or  facts  they  had  observed  about  the  data 
from  nodes  at  which  subjects  commented  on  their  thought  processes  (meta-nodes). 
Nodes  were  numbered  in  the  order  in  which  the  subject  reached  them  To  illustrate 
the  method,  let  us  examine  the  latter  portion  of  the  protocol  of  the  successful 
sub|ect  SY. 

Protocol  of  Subject  SY 

1.  You  said  don't  use  logarithm?  Ok,  Try  something  else. 

2.  Try  a  simple  function. 

3  The  simplest  one  is  square,  x  square. 

4  Check  if  their  squares  fit  or  not. 

5  88 1,2  -  9.38 
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6.  (36/4)2  -  88 
7  (67.25Z4)2  -  282. 

8.  The  difference  (between  282  and  224  7)  is  big 
9  Again.  (93/4)2 ... 

10  No.  it's  wrong. 

11  The  difference  is  too  big 

12  The  square  increases  too  fast 

13.  So.  try  s3  and  q2 

14.  363/  882  «...  The  easier  way  (using  this  calculator)  is  883/2  -  19.87. 

15  882/3  /36-0. 54(96) 

16.  224. 72/3  /67. 25  =  0. 55 

17  3652  3  /93  *  0.55 

18.  6872'3  / 141.75-0.55 

19  it  looks  not  bad. 

20  43322/3  /483  8  =  0  55. 

21  it  seems  that  it  is  this  kind  of  relationship. 

22  E:  Write  it  down 

23.  S.  That  is.  5  cube  is  in  direct  ratio  to  q  square 

Sentences  (1)  -  (6)  reflected  that  SY  changed  his  search  direction  from  trying 
logarithmic  functions  to  trying  quadratic  functions  and  found  that  (sr/4)2  -  qr  These 
sentences  form  node  22  of  his  problem  behavior  graph  (See  Appendix)  To  show  the 
change  of  search  direction,  we  put  node  22  in  two  places  un  the  PB'j  ihe  lust 
node,  after  node  21.  shows  that  the  direction  is  changed  after  trying  node  21.  The 
second  one.  after  node  8,  shows  that  now  the  search  is  for  quadratic  functions 
again  To  connect  these  two  locations  of  node  22,  we  label  each  of  them,  in 
parentheses,  with  the  coordinate  of  the  other,  and  insert  an  arrow  pointing  forward  or 
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backward,  respectively. 

in  sentences  (7)  -  (9)  SY  calculates  (sy4)2  and  (sy4)2,  and  compares  them  with 
q,  and  qT  Sentences  (10)-02)  report  the  result  of  the  comparison  and  the  trend  of 
the  data  (Nodes  23  and  24)  Sentence  (13)  proposes  a  new  function.  This  is  a 
meta-node,  indicated  by  a  dotted  box  instead  of  a  solid  box.  It  is  not  certain  that 

SY  had  built  a  new  hypothesis,  s3  =  cq2  at  this  time:  he  might  have  been  trying  to 
see  the  trend  of  S'Vq2. 

After  he  had  found  the  rule,  he  said  retrospectively.  "When  l  tried  the  square, 
one  variable  (s)  increased  very  fast,  the  other,  q,  Increased  very  slowly.  I  tried  to 
adjust  them,  increasing  one  (q)  and  decreasing  the  other  (s).  At  first  I  tried  the 
direct  ratio  of  s2  to  q.  Then  I  added  1  to  the  power  of  s  and  1  to  the  power  of  q. 
in  this  way  l  made  them  harmonious."  The  heuristic  he  used  here  is  essentially  the 
same  as  Heuristic  5  in  BACON. 

Sentences  (14)-(18)  form  node  26,  in  which  SY  tried  (<7j)2/3/S..  i*l 4. 

Sentence  (19)  forms  node  27.  Sentence  (20)  forms  node  28  Sentence  (21)  forms 

node  29  I!  seems  that  SY  formed  a  new  hypothesis,  q^aks.  in  node  26  and  27 
He  tested  it  again  in  node  28  and  confirmed  that  the  rule  is  s3  =  cq2  in  node  29 

To  summarize  the  entire  protocol  of  SY.  there  are  three  phases: 

understanding,  initial  search,  and  search  in  depth. 

1.  Understanding  (Nodes  1  to  6).  Initially,  SY  read  and  characterized  the  data. 
He  observed  that  they  were  not  linear,  but  that  both  q  and  s  were  monotone 
increasing. 

2.  initial  search  (Nodes  7  ro  15).  During  this  segment,  SY  searched  in  breadth 
for  a  suitable  function.  After  examining  the  scatter  diagram,  he  chose  four  types  of 
functions,  quadratic,  exponential,  sequential,  and  logarithmic. 

3  Search  m  depth  (Nodes  76  to  29).  In  Nodes  16  to  21,  SY  sought  to  estimate 
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the  parameter  for  the  ratio  of  log  q  to  log  s.  Obtaining  an  estimate,  be  returned  in 
Nodes  22-28.  to  the  quadratic,  and  applied  the  equivalent  of  BACON'S  fifth  heuristic 
to  relate  powers  of  q  to  powers  of  s.  In  this  way.  he  found  the  correct  function 

Model  of  SY’s  Behavior  From  the  problem  behavior  graph  and  the  analysis 
of  SY  s  process,  we  can  describe  his  rediscovery  of  Kepler's  law  in  terms  of  the 
following  models,  expressed  both  at  a  very  general  and  at  a  more  detailed  level  At 
the  most  general  level,  his  behavior  fits  the  model  proposed  by  Simon  and  Lea 
(1974).  see  Figure  1.  SY  searches  both  in  a  space  of  instances  (the  data)  and  a 
space  of  rules  (the  hypothesized  functions).  Information  In  the  Instance  space  (the 
scatter  diagram  and  the  numerical  parameters  he  calculates)  suggests  functions  in 
the  rule  space  for  examination.  Manipulation  of  the  functions  (fitting  them  to  the 
data)  provides  new  information  in  the  instance  space. 

Insert  Figure  1  about  here 

The  productions  of  the  more  detailed  model  are  shown  in  Figure  2.  The  major 
part  of  this  model  consists  of  productions  for  searching  two  levels,  function  types  and 
parameters,  in  the  space  of  rules. 

Insert  Figure  2  about  here 


Sources  of  Information.  SY  and  the  other  subjects  obtain  new  information  in 
four  ways:  as  a  result  of  checking  hypotheses  from  comparino  values  nf  ’ho  rjivon 
data  or  transformed  data,  from  comparing  the  trend  and  shape  of  data,  and  from 
their  diagrams.  From  this  information,  subjects  can  make  decisions  about  changing 
the  type  of  function  they  are  considering,  changing  a  function  parameter,  performing 
some  operation,  or  applying  a  heuristic.  Subjects  do  not  always  consider  the 
function  type  and  the  function  parameters  separately  Sometimes  they  choose  a 
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specific  function,  and  If,  upon  checking  it.  the  result  is  not  satisfactory,  they  then 
choose  a  function  of  a  different  type. 

Diagram.  Subject  SY  drew  a  diagram.  How  did  he  use  it?  Upon  reading 
the  data,  he  noted  that  the  relation  between  $  and  q  was  not  linear,  but  was 
monotone.  For  more  information,  he  constructured  the  scatter  diagram,  and  used  it 
to  decide  that  the  most  likely  function  types  were  the  quadratic,  exponential, 
sequential,  and  logarithmic. 

Feedback.  Most  important  to  SY's  success  was  the  way  in  which  he  used 
feedback  from  the  instance  space  to  the  space  of  hypotheses.  Beginning  at  node 
22.  SY  computed  s2.  but  found  that  it  increased  much  faster  than  q.  Since  he  had 
noticed  that  s  increases  more  slowly  that  q.  he  multiplied  s2/q  by  s/q  (Heuristic  5  of 
BACON),  obtaining  a  constant.  In  other  words,  from  fitting  the  quadratic,  he  not  only 
discovered  that  this  was  not  the  correct  function,  but  he  also  learned  In  what  way  It 
deviated  from  constancy,  enabling  him  to  choose  a  plausible  corrective.  From  this 
example,  we  see  that  a  procedure  like  Heuristic  5  of  BACON  is  not  ad  hoc,  but  is  a 
logical  derivative  of  means-ends  analysis. 

Inefficiency  in  Search.  SY  sometimes  fails  to  use  direct  methods  that  are 
surely  within  his  mathematical  repertoire.  At  node  18.  he  needed  to  find  the 
coefficient  C  and  the  constant  k  for  the  log-linear  relation: 

log  q  +  C  *  log  s  -  k. 

He  could  have  found  these  parameters  by  solving  the  simultaneous  equations 
obtained  from  two  of  the  data  points,  instead  of  doing  this,  he  jried  to  gue?? 
value  of  C,  and  failing,  he  returned  to  considering  the  quadratic  function. 

Best'First  Search.  From  the  P8G  as  a  whole,  we  would  conclude  that  SY 
conducted  a  best -first  search,  although  his  criteria  for  choice  among  different 
continuations  are  not  always  evident.  For  example,  considering  linear,  quadratic,  and 
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logarithmic  functions,  he  examined  the  linear  (simplest?)  functions  first,  then  switched 
to  the  logarithmic  (most  promising?).  Only  when  he  had  failed  to  find  a  fit  of  the 
logarithmic  functions  did  he  return  to  the  quadratic 

Use  of  BACON-like  Heuristics.  We  have  already  noted  that  SY  used  the 
equivalent  of  BACON'S  Heuristic  5  in  the  last  step  of  his  solution.  Of  course  he.  as 
well  as  all  the  other  subjects,  used  Heuristic  i  (Find  a  law);  that  was  part  of  the 
task  instructions  Every  subject  also  knew  that  finding  a  law  was  equivalent  to 
finding  a  constant  function  (Heuristic  2).  SY  also  used  Heuristic  3  --  fit  a  linear 
function  --  in  his  examination  of  the  logarithmic  function,  although  he  did  not  succeed 
in  finding  the  correct  slope  and  intercept. 

Protocol  of  Subject  3 

The  same  model  of  the  discovery  process  that  fits  SY’s  protocol  also  fits  the 
protocol  of  S3.  However,  there  are  some  details  of  the  process  that  are  different  In 
the  two  protocols. 

Breadth-first  Search.  S3  s  protocol  is  much  longer  than  SY  s.  A  simplified 
abstract  of  the  PBG  is  shown  in  Figure  3.  S3  considers  four  types  of  functions, 
linear.  t(q/s>.  quadratic,  and  sequential  patterns,  moving  from  one  to  another  whenever 
he  feels  he  is  not  making  progress,  and  revisiting  each  several  times.  The  PBG 
gives  the  appearance  of  breadth-first  search,  but  the  criteria  for  switching  from  one 
branch  to  another  are  not  evident. 

Insert  Figure  ^  about  here 


Use  of  Abstraction.  Data  abstraction  played  an  important  role  in  S3's  finding 
regularities.  In  step  15  he  re-examined  the  result,  from  step  6,  of  computing  q/s. 
and  then  simplified  these  numbers  to  5/2.  10/3,  4,  5,  9_  In  step  21.  he  similarly 
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abstracted  the  results  of  computing  s^/q  to  3,  4.  5.  6.  and  ii  Finding  that  these 
two  sequences  were  very  close  to  each  other,  he  evoked  Heuristic  4  and  solved  the 
problem.  This  is  a  clear  example  in  his  search  of  feedback. 

Manipulation  of  Functions  and  Data.  An  Important  difference  between  SV  and 
S3  is  that  the  former  searched  mainly  in  the  space  of  functions,  using  the  data  to 
test  his  hypotheses,  while  the  latter  manipulated  the  data,  and  used  abstraction 

(instead  of  a  diagram)  to  find  the  regularities  in  the  data. 

Unsuccessful  Subjects 

We  discuss  next  the  four  other  subjects,  Si.  S2.  S4,  and  SW,  who  progressed 
beyond  linear  functions.  From  their  PBG's  It  can  be  seen  that  their  behavior  fits  the 
general  model  of  the  discovery  process  that  we  used  for  SY  and  S3.  The  obvious 
difference  between  the  unsuccessful  and  the  successful  subjects  lay  In  their  search 
strategies  and  use  of  heuristics. 

Characteristics  of  Search.  The  search  of  the  unsuccessful  subjects  was 
characterized  by  shallowness,  poor  information  feedback,  and  frequent  repetition. 

i  Some  of  the  unsuccessful  subjects,  eg..  Si.  S2.  and  S4.  searched  the 

function  type  space  quite  widely,  but  they  did  not  pursue  the  search  for 
parameters  of  the  functions  systematically. 

2.  Some  of  the  unsuccessful  subjects,  e.g..  Si,  S2,  and  SW.  obtained  little 
more  than  a  "yes-no"  answer  from  their  attempts  to  fit  functions,  Instead 
of  gaining  information  about  the  nature  of  the  discrepancies  that  might 
guide  the  next  steps  of  search.  They  failed  to  call  productions  5  and  12 
of  Figure  2  and  often  called  productions  4  and  13.  Hence,  much  of 
their  search  could  be  described  as  "one-step  search." 

3  Some  of  the  unsuccessful  subjects,  for  example  SW,  proposed  many 
hypotheses  without  examining  any  but  the  easiest  ones  carefully,  and 
often  repeated  hypotheses  that  had  failed  before.  This  is  a  further 
reflection  of  the  lack  of  informative  feedback  to  guide  search. 

Use  of  Heuristics.  All  the  subjects,  as  we  have  seen,  used  BACON'S 

Heuristics  1.  2.  and  3.  However,  the  unsuccessful  subjects  did  not  use  Heuristics  4 
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and  5  appropriately.  For  example,  at  step  2,  Si  found  s/q  to  be  a  decreasing 

function  and  wantad  to  try  $2/q.  But  he  only  observed,  from  s,2/qt.  mat  me  latter 
functions  increased  "much  faster,"  and  then  shifted  his  attention  to  the  logarithmic 

functions  The  unsuccessful  subjects  did  not  use  the  heuristics  systematically  For 

example,  after  applying  Heuristic  4  to  s.  q.  and  getting  q/s.  SW  found  q/s  increasing 
However,  mis  result  to  her  only  meant  that  q/s  was  not  a  constant.  She  did  not 

continue  to  use  Heuristic  4.  or  5  instead,  she  went  on  to  try  s,  _ , -S|,  q,  ^,-q,  and 
never  made  one  step  more  along  this  direction. 

Particular  Characteristics  of  SI.  While  S2.  S4,  and  SW  used  scatter 
diagrams  to  help  choose  an  appropriate  function  type,  Si  drew  a  graph  of  the 

function  y  -  s2  to  see  if  there  was  a  quadratic  relation  between  s  and  q  Toward 

the  end  of  his  experiment,  he  recalled  the  quadratic  formula  in  physics  for  the 

acceleration  of  a  failing  body,  and  checked  to  see  if  it  matched  me  given  data.  His 
style  of  trying  everything  in  his  mental  repertory  is  reminiscent  of  the  phenomena 
studied  by  repair  theory  (Vanlehn  and  Ball.  1987). 

Summary  of  Experiment  1 

in  experiment  1.  nine  subjects  tried  to  rediscover  Kepler  s  third  law  Two 
succeeded,  while  the  others  failed.  Three  subjects  who  failed  lacked  the 
mathematical  knowledge  necessary  to  find  the  law.  The  protocols  of  the  other  six 
subjects,  successful  and  unsuccessful,  all  fit  a  basic  model,  a  particularization  of  that 
of  Simon  and  Lea  (1974). 

All  the  subjects  used  heuristics  like  BACON'S.  Heuristics  1.  2.  and  3  were 

used  by  everyone.  Heuristics  4  and  5  were  also  used  frequently,  although  not  to  the 
same  extent  by  all  subjects.  The  successful  subjects  proceeded  relatively 
systematically,  and  obtained  relevant  information  by  feedback  from  the  search.  The 
unsuccessful  subjects  were  less  systematic,  and  less  able  to  obtain  information  from 
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their  tests  of  hypotheses.  As  a  result,  they  were  not  able  systematically  and 
successfully  to  use  Heuristics  4  and  5.  which  depend  on  feedback. 

Experiment  2 

Method  and  General  Results 

in  Experiment  i.  the  subjects  were  allowed  to  use  calculators  that  had 
operations  for  computing  exponentials  and  logarithms  Logarithms  appeared  upon  the 
scientific  scene  at  just  about  the  time  that  Kepler  discovered  his  Third  Law.  and  in 
subsequent  years  Kepler  himself  played  an  active  role  in  their  further  development. 
Nevertheless,  although  Kepler  learned  about  logarithms  within  a  year  of  his  discovery 
of  the  Third  Law.  the  weight  of  evidence  is  that  he  did  not  use  them  In  the 
discovery  We  decided,  therefore,  that  we  should  run  a  second  experiment  In  which 
the  calculators  available  to  the  subjects  had  no  exponential  or  logarithmic  functions, 
in  ail  other  respects,  the  second  experiment  was  identical  with  the  first. 

The  change  in  availability  of  computing  aids  had  two  consequences.  One  is 
that  the  subjects  speeds  of  calculation  decreased  slightly.  (For  example,  they  had 
to  do  x'x'x  to  compute  x3.)  Second,  they  now  could  not  calculate  square  roots. 
We  gave  them  access  to  tables  of  roots  to  overcome  this  second  difficulty. 

Table  7  describes  the  subjects  and  their  best  approximation  to  the  law  they 
were  seeking.  Two  of  the  five  subjects  were  successful,  three  unsuccessful. 

Insert  Table  7  about  here 


8ehavior  of  Subjects 

Corresponding  to  Tables  4,  5  and  6,  respectively,  in  experiment  t,  Table  8 
shows  the  principal  types  of  functions  that  the  five  subjects  in  experiment  2 
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generated  and  examined,  and  the  number  of  times  each  function  type  was 
considered  by  each  sub|ect;  Table  9  shows  the  percentages  of  the  function 
references  belonging  to  the  various  types,  for  each  subject,  and  in  total:  Table  10 
provides  more  detail  on  percentages  of  references  to  the  six  functions  that  subiects 
in  experiment  2  considered  most  frequently,  and  that  account  collectively  for  more 
than  half  of  all  the  references 

Insert  Table  8  about  here 


Insert  Table  9  about  here 


Insert  Table  10  about  here 


Comparing  the  corresponding  Tables  of  the  two  experiments,  we  see  that  the 
results  of  the  two  experiments  are  generally  consistent:  Linear  functions  were 
considered  most  frequently,  simple  functions  were  considered  more  frequently  than 
complex  ones,  there  were  large  individual  differences  in  the  functions  considered,  and 
diagrams  were  used  extensively.  Of  course,  there  are  a  few  differences,  in  table  8 
the  total  references .  per  subject  (10)  is  a  little  less  than  that  in  Table  4  (12.2).  One 
reason  may  be « that  the  calculation  took  more  time  in  experiment  2  than  in 
experiment  1  because  of  the  change  in  calculators  in  experiment  2.  there  were  a 
few  more  references  to  quadratic  and  cubic  functions  than  in  experiment  t  aiihxiiati 
the  differences  are  not  significant  by  t-test.  Perhaps,  the  awkwardness  of  exponential 
and  logarithmic  computations  caused  the  subjects  to  try  more  quadratic  and  cubic 
functions  in  experiment  2.  Nevertheless,  None  of  the  differences  between  Table  4 
and  Table  8.  and  between  Table  5  and  Table  9  are  large.  The  differences  between 
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Table  6  and  Table  10  look  a  little  larger  in  Table  6,  there  are  seven  functions,  but 
in  Table  10,  there  are  only  six.  Four  functions  (s*.  inq/ins,  q-s  and  s, . ,  <3,., -0,1  m 

Table  6  do  not  appear  in  Table  10  instead,  there  are  three  new  functions  in  Table 
10  (s2  +  bs  +  c»q.  s2/q2  and  s3  or  s3/q).  The  disappearance  of  s‘  and  inq/ins  can. 
obviously,  be  attributed  to  the  changing  of  the  calculators  The  others  might  be 
caused  by  the  large  individual  differences  among  the  subjects. 

The  Successful  Subjects,  S8  and  S9 

From  the  PBGs  of  S8  and  S9.  the  two  subjects  who  found  the  law,  it  appears 
that  their  search  models  closely  resembled  those  of  SY  and  S3  in  the  first 
experiment  They  searched  relatively  systematically,  obtained  feedback  from  their 
tests  of  hypotheses,  and  used  the  feedback  to  guide  further  search. 

Subject  S8  manipulated  the  data,  examining  only  a  few  functions,  and  found 
the  law  very  quickly  His  style  of  search  resembled  that  of  S3.  Most  of  his 
manipulation  consisted  in  computing  functions  of  s.  then  comparing  these  with  q.  He 
used  Heuristic  4  in  combination  with  hill-climbing  search  (successive  approximation), 
and  solved  the  problem  without  the  help  of  a  diagram. 

Subject  9  selected  linear  and  quadratic  equations,  then  constructed  a  scatter 
diagram  of  the  data  Next  she  chose  the  function  q  »  as3  After  observing  the 
behavior  of  this  function,  she  used  Heuristic  5  to  find  the  solution. 

Three  Unsuccessful  Subjects 

Subjects  6  and  10  searched  over  sets  of  funrfinne  rather  tom.-ui.  mi.  -m.  i 

without  effective  feedback  of  information.  For  example,  S6  examined  all  of  these 
simple  functions: 

(1)  s  multiplied  or  divided  by  a  number 

(e.g.,  s* 2.5,  3/s) 

(2)  the  sum  of  s  and  q 
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(3)  differences 

(e.g.  ,  q-s,  <q2-qi)/'(S2/sl>) 

(4)  the  product  of  q  and  s 

(5)  the  ratio  of  q  to  s 

(e.g.,  q/s,  (q/s)2) 

These  searches  were  generally  carried  to  only  one  step  in  depth. 

S7  proceeded  more  systematically  than  S6  or  StQ.  and  obtained  feedback  that 
he  used  to  guide  his  search  He  searched  by  selecting  successive  functions,  but 
failed  to  solve  the  problem  after  having  spent  an  hour  and  a  half.  He  failed 
because  he  did  not  sufficiently  explore  simple  functions,  but  tried  complex  ones  such 
as  the  hyperbola,  and  derivatives  and  Integrals. 

Summary  of  Experiment  2 

The  behavior  observed  in  Experiment  2  Is  wholly  consistent'  with  that  In 
Experiment  1  Even  without  access  to  a  calculator  for  logarithmic  functions,  two 
subjects  succeeded  in  rediscovering  Kepler  s  third  law  The  searches  covered  a 
somewhat  narrower  range  of  functions  than  were  covered  by  the  subjects  in 
Experiment  1. 

Perhaps  most  interesting  was  the  demonstration,  in  S7  s  failure  to  solve  the 
problem,  that  the  effective  use  of  feedback  and  systematic  search  are  necessary,  but 
not  sufficient,  conditions  for  success. 

Heuristics 

in  the  experiments,  we  have  seen  that  the  subjects  empir>v®d  nnmomnt 
heuristics  for  searching  function  types  and  the  parameters.  We  now  summarize  their 
heuristics  and  compare  them  with  BACON’S  heuristics. 

Supervisory  heuristics 

1.  Try  simple  functions  first. 

For  example: 
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(1)  Begin  by  checking  linear  functions. 

(All  subjects  except  S10,  vho  began  by  constructing  a 
scatter  diagram. ) 

2.  If  simple  functions  don't  work,  try  more  complex  ones. 

For  example: 

(1)  If  linear  functions  don't  work,  try  quadratic  functions. 
(S3.  S6 ,  S7.  S8 ,  S9) . 

(2)  If  quadratic  functions  don't  vork,  try  cubic  functions. 
(SV.  S7) 

(3)  If  cubic  functions  don't  vork,  try  more  complex  functions. 
(SV) 

3.  In  trying  complex  functions,  try  the  simplest  first. 

'  (SI) 

4.  If  complex  functions  don't  vork,  try  simpler  functions. 

(SI,  S3,  S4,  SU,  S7) 

5.  If  the  function  looks  too  complex,  don't  check  it  in  detail. 
(SV.  and  other  subjects) 

6.  If  you  find  some  trends  in  the  data,  persist  in  using  them. 
(Successful  subjects,  S7). 

7.  Use  one  or  tvo  of  the  pairs  of  observation  to  conjecture  a 
formula  and  test  it  by  other  pairs. 

(All  of  the  subjects) 


That  linear  functions  were  considered  most  frequently  and  simple  functions  more 
frequently  than  complex  ones,  can  be  explained  by  subjects'  employing  the  heuristics 
mentioned  above. 

Operation  heuristics. 

1.  BACON'S  Heuristic  4. 

(1)  If  s  increases  as  q  increases, 

then  try  s^/q^  (SI) 

(l')If  s  increases  as  q  increases 
then  try  q^/s^ 

(S2,  S3,  S4,  SY,  SV,  S6,  S7,  S9) 

(2)  If  q/s  increases  as  s^/q  increases, 
and  the  values  are  very  similar, 

then  try  (s^/q)/(q/s)  i.e.  s^/q^.  (S3) 

(3)  If  s^  increases  as  q  increases 

then  try  s^/q.  (S3,  S8) 
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(4)  If  s2/q  increases  as  q  increases 

then  try  (s2/q)/q,  i.e.  s2/q2-  (SI) 

(5)  If  Ins  increases  as  lnq  increases 

then  try  lnq/lns.  (S2) 

(6)  If  q-s  increases  as  s  increases 

then  try  (q-s)/s.  (S4) 

(7)  If  i-qi  increase  as  s^j-Sj  increase 

then  try  (qui-qi^^^i-sp.  (SV,  S6) 

(8)  If  s  increases  as  sVq  increases 

then  try  s/(s2/q).  (S8) 

(9)  If  qui-q^s^-Si  increases  as  s^^-s^  increases, 

then  try(qi„1-qi/si+1-si)/(si+1-si).  (S10) 

Some  are  more  complex: 

(10)  If  q/s  increases  as  s  increases 

then  try  (qi+i/si+1  -  qi/si)/(si+1-si) .  (S9) 

(11)  If  qi  +  i~qi  increases  as  s^-s^  increase 

then  try  ln((qi+1-qi)/(si+1-si)) .  (S2) 

2.  BACON'S  Heuristic  5. 

(1)  If  S2  increases  much  faster  than  q, 

and  S  increases  more  slowly  than  q 

then  try  s2/q2.  (SY) 

(2)  If  q  increases,  but  not  as  fast  as  s2  (i.e.  q/s2  decreases) 

then  try  q2/s2.  (S9) 

(3)  If  s^/q^  decreases  as  s^  increases 

then  try  s^2/  q^.  (SI) 

3.  Hill-climbing  combining  with  BACON'S  Heuristic  4. 

By  hill-climbing,  we  mean  repeating  a  transformation  if  it  produces 
a  more  nearly  constant  function,  reversing  it  if  it  leads  away  from 
constancy. 

For  example: 

(1)  If  s2/q  increases  then  check  s2/q,  and  if 
s2/q  increases  faster  than  s2/q 

then  try  (s2)*/,-/q.  (S8) 

(2)  If  q/s  increases  then  try  (q/s)2,  and  if 
(q/s)*  increases  faster  than  q/s, 

then  try  (q/s)3,22.  (S6) 

4.  Other  heuristics  aimed  toward  constancy. 

(1)  Division. 
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For  example: 

(a)  If  Sf  increases  as  qi  increases 

then  try  s i / i ,  q ^ / i -  (S2) 

(b)  If  s  increases 

then  try  s/100,  s/1000.  (S3) 

(c)  If  s2  >  q 

then  check  whether  s*"/q  ■  c.  (SI,  SY) 

(d)  If  s,  s  and  q  increase 

then  try  (s2-q)/s.  (S8) 

(2)  Subtraction. 

For  example: 

(a)  If  q  >  s  then  try  q-s.  (S3,  S4,  SV,  S6) 

(3)  Square  root. 

For  example: 

(a)  If  q/s  increases 

then  try  q1/2/s.  (S6) 


(4)  Logarithm. 

For  example: 

(a)  If  q  increases  as  s  increases,  and  q>s 

then  try  logq,  logs.  (SI,  S2 ,  S4) 

(b)  If  q/s  increases 

then  try  log(q/s).  (S2) 

5.  Sequential  lavs. 

Some  subjects  tried  to  find  regularities  in  the  sequence 
of  values  of  s  or  q,  or  both.  BACON  vould  not  attempt  this, 
but  as  ve  noted  earlier,  Kepler  tried  very  hard  to  find  a 
lav  for  the  distances  between  successive  planets. 

For  example: 

(1)  If  s  increases  as  q  increases 

then  check  Sj^-Sj,  (SI,  S2,  SV) 

(2)  If  q/s  increases 

then  try  qit1/sK1  -  q{ /s{ .  (S2) 

(3)  If  s  increases  as  q  increases 

then  check  s^/s^,  q^i/dj*  (S3) 

(4)  If  s  increases 

then  try  si+1  -  x*si.  (S4) 

(5)  If  s  and  q  increase,  and  q>s 

then  check  ( i -*■  1  )s ^  -  q^.  (S6) 

6.  Decomposition. 
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(1)  Try  s1  -  32*22.  <S2,  S8) 


7.  Guessing. 

(1)  To  solve  linear  equation  lnq  *  c*lns  =  constant,  given  that  I c  t  > 1 , 

guess  a  value  of  c.  (SY) 

(2)  To  fit  q  *  sx, 

guess  a  value  of  x.  (SI,  S4) 

(3)  To  find  a  function, 

9 

seek  a  lav  in  physics:  x  *  at~/2  *c, 
and  try  the  analogue:  q  *  sx/y  +  c.  (SI) 


8.  Unreasonable  or  faulty  heuristics. 

Some  heuristics  used  by  S6  are  these: 

(1)  If  s  increases  as  q  increases,  then  try  s+q. 

(2)  If  s  increases  as  q  increases,  then  try  s*q. 

(3)  If  s  increases,  then  try  3/s. 

One  of  the  faulty  heuristics  used  by  S9  and  others  is: 

If  q/s  is  not  a  constant, 

then  there  is  no  linear  relationship  between  s  and  q.. 


From  this  survey,  we  can  see  that  the  subjects  evoked  various  strategies  and 
numerous  heuristics  when  they  tried  to  find  a  law  within  the  given  data.  BACON'S 
heuristics  were  used  very  frequently  by  the  subjects,  although  seme  of  the  objects  to 
which  BACON'S  heuristic  4  or  5  were  applied  were  different  from  those  used  by 


BACON.  These  heuristics  were  evoked  in  somewhat  different  ways  by  BACON  and 


the  human  subjects.  BACON  uses  its  heuristics  recursively,  as  explained  at  the 


beginning  of  this  paper  The  human  subjects  were  not  as  systematic  in  their  use. 
Often  after  the  successful  subjects  evoked  one  of  these  heuristics  they  did  not 
immediately  follow  up  the  result.  Instead,  they  first  tried  some  other  heuristics  before 
turning  back  to  a  new  application  of  the  8ACON  heuristics  Or  like  cp  they 
sometimes  combined  hill-cllmblng  with  BACON'S  Heuristic  4.  After  using  one  of 
BACON'S  heuristics,  unsuccessful  subjects  generally  neither  followed  up  immediately 
nor  return  to  it  later. 
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Discussion  and  General  Summary 

in  these  experiments.  14  subjects  tried  <o  rediscover  Kepler  s  third  law  using 
given  data  Four  succeeded,  one  came  very  close,  and  the  other  nine  failed  What 
do  we  learn  from  the  behavior  patterns  observed  in  these  subjects? 

We  learn,  first  of  all.  that  data-driven  discovery  of  a  scientific  law  does  not  call 
for  unknown  or  mysterious  problem  solving  processes  Kepler's  discovery  of  his  third 
law  was  an  event  of  great  significance  in  the  history  of  science.  It  is  regarded  as  a 
discovery  of  first  magnitude  From  the  fact  that,  with  given  data,  four  out  of  14 
subjects  could  rediscover  this  law  within  one  hour,  and  from  the  search  processes 
revealed  in  the  subjects  protocols,  we  can  say  that  some  significant  discoveries  can 
be  made  simply  by  application  of  the  general  processes  that  have  been  observed  In 
all  kinds  of  problem  solving 

Generally,  the  data  driven  discovery  observed  in  these  experiments  Is  a  process 
of  interactive  search  of  a  hypothesis  space  and  an  instance  space,  as  proposed  by 
Simon  and  Lea  (1974).  in  these  experiments,  the  hypothesis  space  has  two  levels: 
the  level  of  function  type  and  the  parameter  level. 

There  are  two  stages  in  the  process  of  discovery,  an  initial  stage  of 
understanding  the  problem  and  the  data,  and  a  subsequent  stage  of  search  The 
basic  search  strategy  used  by  the  subjects  appears  to  be  best-first  search,  using  a 
variety  of  criteria  to  determine  in  what  direction  the  search  should  continue.  in 
terms  of  the  acquisition  of  new  information  the  search  can  incorporate  feedback  from 
the  results  of  testing  hypotheses  or  can  simply  employ  "succeed-fair  tests  The 
effective  use  of  feedback  to  guide  search  is  a  prerequisite  for  using  heuristics 
successfully. 

in  the  two-level  model  there  must  be  guidance  for  both  function  selection  and 
parameter  selection  Scatter  diagrams  and  graphs  are  important  tools  for  function 
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selection;  while  abstraction  and  other  transformations  of  the  data,  along  with 
applications  of  heuristics  like  those  employed  in  the  BACON  program  are  valuable 
tools  for  parameter  selection,  even  though  sometimes  subjects,  like  SY.  employed 
abstraction  to  check  the  function  type.  Some  subjects  devote  most  effort  to  function 
selection,  others  to  parameter  selection  by  manipulation  of  the  data,  some  combine 
both  methods 

Large  differences  are  observed  among  the  strategies  of  different  subjects,  and 
these  differences  are  sufficient  to  distinguish  successful  from  unsuccessful  subjects, 
as  indicated  above.  The  experimental  data  do  not  determine  sufficient  conditions  for 
success  in  data-driven  discovery  of  scientific  laws  (although  of  course  the  behavior  of 
the  successful  subjects  does  exhibit  such  conditions  for  this  particular  law).  However, 
the  data  do  illustrate  some  necessary  conditions:  (1)  possessing  essential  knowledge 
of  the  domain.  (2)  applying  good  search  strategies.  (3)  using  heuristics  appropriately 
and  systematically,  and  (4)  searching  at  both  function  level  and  oarameter  level. 

We  may  compare  the  behavior  of  the  subjects  in  this  experiment  with  the 
behavior  of  the  BACON  program  when  it  is  given  the  same  task.  Kepler  s  third  law 
is  only  one  of  the  laws  rediscovered  by  BACON,  and  exercises  only  a  subset  of 
BACON  4's  capacities.  In  trying  to  rediscover  Kepler  s  third  law.  most  human 
subjects  need  to  generate  and  choose  among  different  function  types.  BACON, 
because  of  Its  structure,  need  not  do  so.  It  carries  out  its  search  using  only  linear 
functions  and  ratios. 

BACON'S  Heuristics  1.  2.  and  3  are  used  by  almost  everv  subject  m  r.»jr 
experiments.  Successful  subjects  also  used  procedures  closely  resembling  Heuristics 
4  and/or  5  successfully;  heuristic  4  being  used  more  often  than  5.  Unsuccessful 
subjects  used  Heuristics  4  and  5  only  very  unsystematically,  or  used  them 
inappropriately.  The  heuristics  used  by  the  subjects  are  perhaps  more  general  and 
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flexible  than  those  incorporated  in  BACON  --  for  the  most  part,  they  can  be  regarded 
as  forms  of  means-ends  analysis.  Sometimes,  they  are  too  general  to  be  effective 
for  example,  "if  a  variable  increases,  try  to  decrease  it:  if  it  decreases,  try  to 
increase  it." 


Scientific  Discovery  in  History 

One  motive  for  this  research,  the  one  we  have  mainly  discussed  so  far,  was  to 
characterize  human  problem  solving  in  a  data-driven  discovery  task,  where  a  priori 

theory  could  play  no  role,  and  to  compare  It  with  the  general  theoretical  model 
proposed  by  Simon  and  Lea  (1974)  and  the  more  specific  theory  Implemented  in 
BACON  (Langley  et  al..  1987) 

Another  motive  was  to  see  what  light  such  an  experiment  could  cast  on  an 

actual  historical  instance  of  discovery  in  a  case  where  there  is  reasonably  good 

evidence  that  the  discovery  was  driven  by  the  data  and  received  no  substantial 

guidance  from  relevant  pre-existing  theory. 

The  present  experiment  was  preceded  by  two  others,  one  informal  and  the 
other  not  yet  published,  relating  to  other  scientific  discoveries  In  the  informal 
experiment,  five  out  of  eight  subjects,  given  a  qualitative  description  of  the  data  Max 
Planck  had  available  in  October  1900,  found  Planck's  law  of  blackbody  radiation  in 
less  than  two  minutes  each.  (Planck  himself  found  it.  in  purely  data-driven  fashion,  in 
not  more  than  a  few  hours.)  This  experiment  and  the  history  behind  it  are  recounted 
bv  langiev  et  al.  (1987V  pp  47-54  The  processes  used  by  subjects  to  find  Pinn<-tt  ? 
law  (and  the  processes  that,  from  the  historical  evidence.  Planck  used)  are  processes 
commonly  used  by  skilled  applied  mathematicians  (which  the  subjects  in  that 
experiment  were). 

in  the  other  experiment  (Kulkarni  and  Simon,  unpublished),  a  single  subject,  a 
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chemical  engineering  graduate  student,  employed  full-time  for  the  task,  succeeded  in 
rediscovering  Balmer’s  formula  for  the  hydrogen  spectrum  in  about  60  hours'  working 
time.  Baimer  accomplished  the  same  task  (in  1883)  in  some  weeks  of  part-time  work 
Both  the  subject  and  Baimer  worked  with  the  same  data,  and  without  any  relevant 
pre-existing  theory  (none  existed  in  Baimer  s  time)  The  tape-recorded  protocol  and 
notebooks  of  the  subject  reveal  just  the  same  kind  of  search  as  we  have  described 
fo/  the  subjects  in  our  present  experiment,  and  as  are  revealed  in  such  documents 
as  Baimer  left  behind. 

Returning  to  Kepler,  we  observed  earlier  that  not  too  much  detail  is  known 
about  how  he  derived  the  third  law:  certainly  we  can  not  follow  his  work  on  a  day-to- 
day  basis.  He  wrote  quite  voluminously,  however,  about  his  goal  and  motivations,  and, 
in  the  manner  of  his  age.  was  explicit  about  his  philosophical  assumptions.  We  have 
studied  his  views  with  care,  especially  the  Epitome  of  Copernlcan  Astronomy 
(1618-21/1952),  and  Harmonies  of  the  World  (1619/1952),  as  have  a  number  of 
historians  of  science,  and  find  that  these  works  provide  a  consistent  view  of  his 
procedures 

Kepler  s  work  is  characterized  by  a  painstaking  attention  to  data,  especially  the 
magnificent  data  to  which  he  fell  heir  on  the  death  of  his  employer.  Tycho  Brahe 
The  greatest  part  of  his  occupation  for  a  quarter  century  was  working  these  and 
earlier  data  Into  a  parsimonious  Copernican  description  of  the  heavens.  The  three 
laws  that  bear  his  name  were  essential  steps  along  the  way  in  these  respects 
Kepler  was  a  data-driven  discover  of  laws. 

But  Kepler  was  not  satisfied  with  a  mere  description  of  the  phenomena  --  the 
"geometry"  as  he  regarded  it.  He  wanted  to  trace  the  behavior  of  the  Sun,  stars 
and  planets  to  their  physical  causes.  Kepler  insistently  sought  to  know  not  only  how 
things  are.  but  why  they  have  to  be  that  way.  Kepler  was  deeply  concerned  with 
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theory. 

What  roles  did  data  and  theory  play  in  Kepler's  discovery  of  his  third  law? 
First,  he  did  not  invent  the  problem  of  the  relation  between  the  distances  of  the 
planets  and  their  periods.  The  problem  had  already  been  discussed  at  least  as  far 
back  as  Aristotle  (On  the  Heavens.  8k. ii.  ch.  10).  and  Aristotle  had  observed  that  the 
outer  planets  moved  more  slowly  than  the  inner  ones  Second,  the  rather  precise 
data  that  Kepler  used  to  discover  the  third  law  were  partly  products  of  his  successful 
investigations  of  the  paths  of  the  planets,  in  the  course  of  which  he  found  the 
elliptical  shape  of  their  orbits,  and  from  which  he  could  make  accurate  calculations 
of  the  mean  diameters  of  their  orbits.  (Accurate  data  on  the  periods  of  revolution  had 
already  been  provided  by  Brahe  and  others  and  reasonably  accurate  data  on  the 
diameters  by  Copernicus.) 

it  is  sometimes  argued  that  the  real  problem  of  scientific  discovery  is  not  to 
find  laws  in  data  but  to  define  the  problem  and  to  discover  the  relevant  data.  But 
we  have  just  seen  that  defining  the  problem  and  discovering  the  data  were  not 
Kepler  s  primary  contribution  The  problem  of  describing  the  heavens  parsimoniously 
he  inherited  from  a  long  line  of  predecessors,  and  the  data,  as  explained  above,  he 
mainly  inherited  from  Brahe  and  Copernicus.  His  merit  was  that  he  converted  the 
data  to  a  form  that  revealed  the  geometry  of  the  heavens  and  laid  the  foundation  for 
Newton's  Inertial  and  gravitational  explanation.  From  a  scientific  standpoint,  his 
attempts  to  provide  "physical"  explanations  for  his  empirically  derived  laws  are  now 
only  historical  curiosities. 

After  he  had  found  the  third  law,  Kepler  searched  for  causes,  as  he  had  done 
a  decade  earlier:  when  he  had  erroneously  concluded  that  the  periods  of  the  planets 
varied  as  the  squares  of  their  distances  from  the  Sun.  The  Sun  was  the  cause, 
which  as  it  rotated  on  its  axis  swept  with  it  the  objects  (planets)  in  the  space  around 
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it.  with  a  fore#  that  became  more  feeble  with  distance.  How  feeble?  Just  feeble 
enough  to  account  for  the  observed  law.  There  were  enough  hypothetical  variables— 
the  sizes  of  the  planet,  their  masses  (unmeasured),  the  rate  of  attenuation  of  the 
force— to  account  for  a  linear,  a  square  law  or  a  3/2  power  law.  or  about  any  law 
that  the  data  revealed 

m  the  philosophical  style  of  his  day.  the  ad  hocness  of  causal  explanations  was 
no  great  concern.  The  data  had  revealed  a  pattern,  and  causes  must  exist.  Kepler's 
attitude  on  this  point  is  quite  clear  in  his  treatment  of  another  problem  where  his 
"causes"  and  the  data  didn  t  quite  agree  One  of  his  great  passions  was  to  explain 
the  distance  between  successive  planets  in  terms  of  spheres  inscribed  in.  and 
circumscribed  about,  the  five  regular  solids  (Harmonies  of  the  World.  Bk.V,  ch.1-3). 
When  the  data  did  not  fit  the  hypothesis.  Kepler  did  not  dismiss  either  hypothesis  or 

data,  but  openly  admitted  the  discrepancy.  Then  he  sought  additional  causal  forces 

(celestial  musical  harmonies  in  this  case)  to  remedy  the  defects.  The  point  is  that 

regularities  of  data  came  first:  causes  had  to  be  shaped  to  fit  them. 

in  words  Poincare  used  to  discuss  difficulties  in  the  development  of  the  theory 

of  special  relativity.  "An  explanation  was  necessary,  and  was  forthcoming:  they  always 

are.  hypotheses  are  what  we  lack  the  least."  (quoted  in  Miller  1984. p  65). 

There  is  every  reason  to  believe,  therefore,  that  Kepler  found  his  third  law  by 
examining  the  data,  much  as  our  subjects  did.  In  1596,  as  a  young  man  of  25,  he 

asked,  as  did  some  of  our  subjects,  whether  the  ratios  of  the  periods  of  any  two 

planets  might  vary  as  the  ratio  of  their  distances.  Finding  that  the  ratios  cf  the 

periods  were  too  large,  he  tried  alternative  functions,  and  arrived  at  the  quadratic  law 
(period  varies  with  the  square  of  the  distance).  He  published  this  formula  thirteen 
years  later,  in  1609.  Like  some  of  our  subjects  (e.g.,  SW,  S10),  he  was  then 
satisfied  with  the  approximate  fit  of  this  formula  to  the  data.  Moreover,  he  tried  to 
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support  it  by  the  hypothesis  that  the  period  was  equal  to  the  ratio  of  the  length  of 
me  orbital  path  divided  by  the  strength  of  the  Sun's  driving  force. 

By  1618  Kepler,  no  longer  satisfied  with  the  empirical  accuracy  of  the  quadratic 
law.  returned  to  the  problem,  and  soon  found  the  law  we  now  regard  as  correct 
According  to  Gingerich  (1975).  "Kepler  says  it  was  conceived  on  March  8th  of  this 
year.  1618.  unfeiicitousiy  submitted  to  calculation  and  rejected  as  false,  and  recalled 
only  on  May  15  when  by  a  new  onset  it  overcame  by  storm  the  darkness  of  my 
mind  with  such  full  agreements  between  this  idea  and  my  labor  of  seventeen  years 
on  Brahes  observations  .  .  .'  it  Is  a  pity  that  he  did  not  leave  behind  a  record  of 
the  heuristics  he  used. 

Conclusion 

We  have  already  summarized  our  empirical  findings,  and  have  commented  on 
their  implications  both  for  the  theory  of  discovery  as  problem  solving,  and  for 
historical  scientific  discoveries,  it  only  remains  to  put  data-driven  discovery,  like  thai 
examined  here,  in  a  broader  context  of  scientific  activity. 

Science  is  an  incremental,  cumulative  process.  No  single  step  in  that  process  is 
the  real  discovery  "  As  Langley  et  at.  (1987)  point  out.  scientists  define  problems 
and  find  new  ways  of  representing  them.  They  generate  new  phenomena  and  new 
data,  sometimes  with  the  help  of  new  instruments  they  or  others  have  Invented.  With 
the  guidance  of  data  or  theories,  or  both,  they  find  new  lav/s  to  describe  data,  and 
new  concepts  and  mechanisms  to  explain  whv  the  laws  hold  Thov  test  then'ie«  and 
communicate  their  findings.  All  of  these  and  perhaps  others,  are  the  incremental 
steps  that  make  up  the  cumulative  process  of  scientific  discovery. 

in  this  paper,  we  have  examined  one  class  of  these  incremental  steps,  data- 
driven  discovery,  and  have  found  that  it  proceeds  in  the  same  manner  as  many  other 
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problem-solving  processes  that  have  been  studied  and  described.  We  believe  that  this 
result  can  be  generalized  to  cover  most,  perhaps  all.  of  the  processes  of  scientific 
discovery  But  of  course,  to  demonstrate  that  will  require  carrying  out  many,  many 
more  incremental  steps  of  the  same  kind 
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The  problem  Behavior  Graphs  of  Subjects  SY  and  S3 
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s 

9 

36 

88 

67.25 

224.7 

93 

365.3 

141.75 

687 

483.8 

4332.1 

Table  1:  The  data  given  to  subjects  in  Experiment  1 
(s- Distance;  q-  Period  of  Revolution) 


Subject 

Si tuation 

The  best  results 

SI 

Sophomore . 

s/q-c,  s2/q»c,  s1,23-q 

S2 

Senior. 

lnq/lns*c 

S3 

Freshman. 

s3/6.025«q2  (correct) 

S4 

Junior  EE. 

s1-49«q  (nearly  correct) 

S5 

Freshman. 

88-2*36-16  (q^-ks^+b,  i»l,2,3,4)? 

222-484 

SY 

Grad  in  Phys. 

q2/3=»0.55s  (correct)  1 

su 

Engineer. 

s2/ q*c ,  s3/q-c 

SG 

Grad  in  Art 

qi/sl«xi*y 

History. 

q2/s2*X2*y 

qj/s^-x^y 

SJ 

Grad  in  Edu. 

q-2x*s+b 

Table  2:  Subjects  and  their  best  results  In  Experiment  1 
(s -Distance;  q- Period  of  Revolution) 
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1.  FIND- LAVS 

If  you  vant  to  iterate  through  the  values  of  independent 
term  I,  and  you  have  iterated  through  all  the  values  of  I, 
then  try  to  find  lavs  for  the  dependent  values  you  have 
recorded . 

2.  CONSTANT 

If  you  vant  to  find  lavs, 

and  the  term  0  has  value  V  in  all  data  clusters, 
then  infer  that  D  alvays  has  value  V. 

3.  LINEAR 

If  you  vant  to  find  lavs, 

and  you  have  recorded  a  set  of  values  for  the  term  X, 
and  you  have  recorded  a  set  of  values  for  the  term  Y, 
and  the  values  of  X  and  Y  are  linearly  related 
vith  slope  M  and  intercept  B, 
then  infer  that  a  linear  relation  exists  betveen  X  and  Y 
vith  slope  M  and  intercept  B. 

4.  INCREASING 

If  you  vant  to  find  lavs, 

and  you  have  recorded  a  set  of  values  for  the  term  X, 

and  you  have  recorded  a  set  of  values  for  the  term  Y 

and  the  absolute  values  of  X  increase, 
as  the  absolute  values  of  Y  increase, 
and  these  values  are  not  linearly  related, 
then  consider  the  ratio  of  X  and  Y. 

5.  DECREASING 

If  you  vant  to  find  lavs, 

and  you  have  recorded  a  set  of  values  for  the  term  X, 

and  you  have  recorded  a  set  of  values  for  the  term  Y 

and  the  absolute  values  of  X  increase, 
as  the  absolute  values  of  Y  decrease, 
and  these  values  are  not  linearly  related, 
then  consider  the  product  of  X  and  Y. 
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linear  sequen 
tial 

FUNCTION 
-  quad-  log 
ratic 

cubic 

others 

total 

DIAGRAM 
graph  seatte 

SI 

3 

3 

2 

3 

0 

2 

13 

2 

S2 

1 

7 

1 

4 

13 

0.5 

S3 

5 

1 

6 

1 

13 

S4 

7 

2.5 

1 

1 

6 

17.5 

3 

SY 

3 

1.5 

1 

2 

1 

8.5 

1 

sv 

2 

3.5 

1 

2 

8.5 

1 

Total 

21 

18.5 

12 

10 

4 

8 

73.5 

2 

5.5 

Table  4: 

Numbers  of 

references 

to  functions. 

by  type,  In 

Experiment  1 

linear 

sequential 

quadratic 

log 

cubic 

others 

total 

SI 

23 

23 

15.5 

23 

0 

15.5 

100 

S2 

7.7 

53.8 

7.7 

30.8 

0 

0 

100 

S3 

38.5 

7.7 

46.1 

0 

7.7 

0 

100 

S4 

40 

14.3 

5.7 

5.7 

0 

34.3 

100 

SY 

35.3 

17.6 

11.8 

23.5 

11.8 

0 

100 

SU 

23.5 

41.2 

11.8 

0 

23.5 

0 

100 

Average  28.6 

25.2 

16.3 

13.6 

5.4 

10.9 

100 

Table  5:  Function  references,  percentages  of  total,  in  Experiment  1 


q/s* 

sior 
s1  /q* 

q-s 

S>i/S. 

a 

s 

s.-~ 

qv..-  q. 

lnq/lns 

sum 

total 

X 

SI 

3 

2 

1 

2 

l 

9 

13 

69.2 

S2 

1 

1 

l 

1 

4 

13 

30.8 

S3 

3 

3 

1 

1 

8 

13 

61.5 

S4 

3 

2 

0.5 

2 

1 

8.5 

17.5 

48.6 

SY 

2 

1 

1 

0.5 

1 

5.5 

8.5 

64.7 

Stf 

1 

0 

1 

0.5 

l 

4.5 

8.5 

52.9 

Total 

13 

6 

5 

4.5 

4 

3 

3 

38.5 

73.5 

52.4 

Table  6: 


References  to  seven  functions,  in  Experiment  1 
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Subject 

Si tuation 

The  best  results 

S6 

Freshman,  Phys. 

q2/s2 

S7 

Sophomore,  EE/Econ. 

q2  s2 

-----  c 

S8 

Sophomore,  Chem.  Eng. 

s2/2/q  -  c  (correct) 

S9 

Grad,  in  Civil  Eng. 

q2  -  as2  (correct) 

S10 

Freshman,  Math. 

s^q5 

Table  7:  Subjects  and  their  best  results  In  Experiment  2 
(s  -  Distance:  q  •  Period  of  Revolution) 


linear  sequen 
tial 

FUNCTION 
-  quad-  cubic 
ratic 

others 

total 

graph 

DIAGRAM 

scatter 

S6 

8 

4 

1 

2 

15 

1 

57 

2 

2 

2 

1 

7 

1 

1 

S8 

2 

5 

3 

2 

12 

39 

3 

1 

1 

2 

1 

8 

1 

510 

3 

3 

1 

1 

8 

2 

Total 

15 

10 

12 

6 

7 

50 

1 

5 

Table  8:  Numbers  of  references  to  functions,  by  type,  in  Experiment  2 


linear 

sequential  quadratic 

cubic 

others 

total 

S6 

53.3 

26.7 

6.7 

0 

13.3 

100 

S7 

28.6 

28.6 

28.6 

0 

14.2 

100 

S8 

16.7 

0 

41.6 

25 

16.7 

100 

S9 

37.5 

12.5 

12.5 

25 

12.5 

100 

sin 

0 

17.5 

17.5 

17.5 

1  7  .  *> 

]  1  It  1 

Average 

3U 

20 

24 

12 

1 H 

iuu 

Table  9:  Function  references,  percentages  of  total,  in  Experiment  2 
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q/s 

3;  sr  or 

s “♦bs+e-q 

s'/qi 

5 

s,  or 

sum 

total 

X 

q.,- 

qt  sl/q 

s5/q 

S6 

4 

3 

0 

0 

1 

0 

8 

15 

53.3 

S7 

2 

2 

0 

1 

0 

0 

5 

10 

50 

S8 

1 

0 

2 

0 

1 

3 

7 

12 

58.3 

S9 

1 

0 

0 

1 

0 

1 

3 

6 

50 

SI  0 

0 

1 

1 

2 

0 

0 

4 

7 

57.1 

Total 

8 

6 

3 

4 

2 

4 

27 

50 

54 

Table  10:  References  to  seven  functions,  in  Experiment  2 
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FIGURES 

Experimentation 


Figure  1.  The  General  Model 
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Production*  for  Finding  a  Law 

1.  The  goal  is  to  find  a  law; 

There  is  a  hypothesis; 

— > 

Test  the  hypothesis. 

2.  The  goal  is  to  find  a  lav; 

There  is  a  hypothesis; 

The  result  of  testing  is  "Success"; 

--> 

The  hypothesis  is  the  law  to  be  found; 
Halt. 

The  goal  is  to  find  a  lav; 

There  is  a  hypothesis; 

The  result  of  testing  is  "Failure"; 

— > 

Set  the  hypothesis  as  a  used-hypothesis. 

4.  The  goal  is  to  find  a  law; 

There  is  no  hypothesis; 

— > 

Set  subgoal:  build  a  hypothesis. 

5.  The  goal  is  to  find  a  law; 

There  is  no  hypothesis; 

There  is  a  used-hypothesis; 

— > 

Set  subgoal.-analyse  the  used-hypothesis, 
find  the  trend  of  the  data, 
build  a  hypothesis. 

Productions  for  Building  a  Hypothesis 

6.  The  goal  is  to  build  a  hypothesis; 

--> 

Try  to  find  the  trend  of  the  data. 

7.  The  goal  is  to  build  a  hypothesis; 

There  is  a  trend; 

— > 

Form  a  hypothesis. 

0.  The  goal  is  to  build  a  hypothesis: 

There  is  a  trend; 

— > 

Select  a  function  type. 

9.  The  goal  is  to  build  a  hypothesis; 

There  is  a  function  type; 

— > 

Check  the  function  type. 
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10.  The  goal  is  to  build  a  hypothesis; 

There  is  a  function  type; 

The  result  of  checking  the  function  type  is  "Failure"; 

— > 

Delete  the  function  type. 

11.  The  goal  is  to  build  a  hypothesis; 

There  is  a  function  type; 

There  is  a  trend; 

--> 

Select  a  set  of  parameters,  form  a  hypothesis. 

12.  The  goal  is  to  build  a  hypothesis; 

There  is  a  function  type; 

There  is  a  used-hypothesis; 

There  is  a  set  of  parameters; 

Change  the  parameters,  form  a  hypothesis. 

13.  The  goal  is  to  build  a  hypothesis; 

There  is  a  trend; 

--> 

Select  a  function  (including  the  parameters), 
form  a  hypothesis. 

Productions  for  Finding  a  Trend 

14.  Try  to  find  the  trend  of  the  data; 

— > 

Draw  a  diagram,  analyse  the  diagram, 
set  the  trend  of  the  data. 

15.  Try  to  find  the  trend  of  the  data; 

Transform  the  data,  set  the  trend  of  the  data. 

Figure  2.  The  production  system  of  the  detailed  model 
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Figure  3.  S3' s  Simplified  PBG 
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